Automatic Threshold Estimation for Data Matching Applications
نویسندگان
چکیده
Several advanced data management applications, such as data integration, data deduplication or similarity querying rely on the application of similarity functions. A similarity function requires the definition of a threshold value in order to assess if two different data instances match, i.e., if they represent the same real world object. In this context, the threshold definition is a central problem. In this paper, we propose a method for the estimation of the quality of a similarity function. Quality is measured in terms of recall and precision calculated at several different thresholds. On the basis of the results of the proposed estimation process, and taking into account the requirements of a specific application, a user is able to choose a threshold value that is adequate for the application. The proposed estimation process is based on a clustering phase performed on a sample taken from a data collection and requires no human intervention.
منابع مشابه
A New Structural Matching Method Based on Linear Features for High Resolution Satellite Images
Along with commercial accessibility of high resolution satellite images in recent decades, the issue of extracting accurate 3D spatial information in many fields became the centre of attention and applications related to photogrammetry and remote sensing has increased. To extract such information, the images should be geo-referenced. The procedure of georeferencing is done in four main steps...
متن کاملPerformance Evaluation of Local Detectors in the Presence of Noise for Multi-Sensor Remote Sensing Image Matching
Automatic, efficient, accurate, and stable image matching is one of the most critical issues in remote sensing, photogrammetry, and machine vision. In recent decades, various algorithms have been proposed based on the feature-based framework, which concentrates on detecting and describing local features. Understanding the characteristics of different matching algorithms in various applications ...
متن کاملLearning Threshold Parameters for Event Classi cation in Broadcast News
In this paper we present two methods for automatic threshold parameter estimation for an event tracking algorithm. We view the threshold as a statistic of the incoming data stream, which is assumed to contain broadcast news stories from radio, television, and newswire sources. Query bias deened in terms of threshold estima-tors can be identiied when a word co-occurrence representation for text ...
متن کاملMeasuring quality of similarity functions in approximate data matching
This paper presents a method for assessing the quality of similarity functions. The scenario taken into account is that of approximate data matching, in which it is necessary to determine whether two data instances represent the same real world object. Our method is based on the semi-automatic estimation of optimal threshold values. We propose two methods for performing such estimation. The fir...
متن کاملAutomatic Bounding Estimation in Modified Nlms Algorithm
Modified Normalized Least Mean Square (MNLMS) algorithm, which is a sign form of NLMS based on set-membership (SM) theory in the class of optimal bounding ellipsoid (OBE) algorithms, requires a priori knowledge of error bounds that is unknown in most applications. In a special but popular case of measurement noise, a simple algorithm has been proposed. With some simulation examples the performa...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Inf. Sci.
دوره 181 شماره
صفحات -
تاریخ انتشار 2008